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ABSTRACT 


Very  often  applied  econoraetricians  carry  out  their  investigations 
by  assuming  linearity  when  a  nonlinear  model  would  be  more  appropriate 
This  renders  the  OLS  estimates  both  biased  and  inconsistent.   In  this 
paper,  we  obtain  a  simple  upper  bound  for  the  inconsistency.   We  also 
suggest  a  test  for  nonlinearity . 


THE  USE  OF  LINEAR  APPROXIMATION  TO  NONLINEAR  REGRESSION  ANALYSIS 

A.  K.  BERA 
Department  of  Economics,  University  of  Illinois 

I.   INTRODUCTION 

We  consider  the  following  nonlinear  regression  equation 


y.  =  f(x.,o)  +  u.  ,  i  =  l,...,n  ,  (1) 

.1      1       l 

where  y.  £  (R  and  x.  ^x  c  R  are  observations  on  dependent  and  fixed 

independent  variables,  f(*)  is  the  response  function  which  is  assumed 

to  be  dif f erentiable  with  respect  to  x  up  to  a  finite  order,  u.  is  a 

l 

2     2 
random  variable  with  E(u.)  =  0,  E(u.)  =  a  ,  E(u.u.)  =  0  for  i  4  i  and 

l  i  l  j  ' 

a  is  the  unknown  parameter  vector. 

In  practical  situations  one  cannot  expect  to  have  a  precise  know- 
ledge about  f(').   So  a  common  procedure  is  to  estimate  a  linear  version 
of  (1)  like 


y.  =  x!3  +  e. ,  i  =  1,.,. ,n  ,  (2) 

l    l     l 

with  usual  assumptions  for  e.. 
l 

When  (2)  is  the  true  model  OLS  applied  to  (2)  gives  consistent, 
unbiased  and  efficient  estimates.   None  of  these  desirable  properties 
survives  when  (1)  is  the  true  model.   However,  these  OLS  estimates  of 
3  are  not  entirely  useless.   In  a  recent  pioneering  article  White  [4] 
demonstrated  that  the  OLS  estimates  of  the  parameters  of  linear  ap- 
proximation model  are  inconsistent  and  found  bounds  for  inconsistency. 
In  particular,  he  showed  that  the  extent  of  the  inconsistency  depends 
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directly  on  the  degree  of  concavity  of  the  underlying  nonlinear  func- 
tion and  the  skewness  and  range  of  the  independent  variable,  and  inver- 
sely on  the  variance  of  the  independent  variable.   Using  the  OLS  esti- 
mates he  also  devised  a  test  for  functional  misspecif ication.   When 
f(0  is  known,  they  can  be  successfully  used  to  estimate  a  consistently 
for  a  certain  class  of  functions,  see  Bera  [1]  and  Byron  and  Bera  [3]. 

This  paper  has  two  distinct  aims:   firstly,  to  obtain  a  simpler 
bound  for  inconsistency  for  the  OLS  estimates  of  3  when  (1)  is  the  true 
model  and  secondly,  to  develop  a  new  test  for  functional  misspecif ica- 
tion which  we  shall  call  a  test  for  nonlinearity. 

II.   A  BOUND  FOR  INCONSISTENCY  OF  OLS  ESTIMATES 
By  taking  a  first  order  Taylor  series  approximation  of  f(0  around 
the  origin,  (1)  can  be  written  as 

y.  =  x^S  +  R±  +  u±t  i  =  l,...,n  ,  (3) 

where  3  is  some  function  of  a  and  R.  is  the  remainder  terra.   Let  3  be 

i 

the  OLS  estimate  of  3  from  (3)  without  the  remainder  R  =  (R, ,...,R  )', 
i.e.  , 

3  =  (X'X^X'Y    , 


where  X  =  (x.,...,x  )'  and  Y  =  (y, ,..»>v  )'.   It  follows  that 
In  1     '  n 

plim  3  =  3  +  lim  f^i)    (2±) 

v  n  '        K    n  y 


assuming  the  existence  of  the  limit  on  the  right  hand  side. 
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Now  define  a  vector  norm  of  any  arbitrary  vector  z(K*l)  by 


K 


2vl/2 


izii  =  (  S  zt)VZ,  and  a  matrix  norm  of  a  matrix  A  by  HAH  =   sup  ilAzll  which 

i=l  1  II  z  IK  1 

is  equal  to  the  square  root  of  the  maximum  eigenvalue  of  AA*  where  A* 

is  the  conjugate  transpose  of  A. 

Using  the  continuity  property  of  II  •  II  and  assuming  compactness  of  x> 

we  have 


plim  0-011    <   lim—L  l\(^-*)     X'll    •    lira  -i  IIRII 
""  /n         n  fa. 


(4) 


Individual  terms  in  R  are  (noting  that  we  take  derivatives  only  with 
respect  to  the  last  K-l  nonconstant  independent  variables) 


R.    2  (x.2,...,x.K) 


3  f 


3x 


12 


a2f 


3x.  3x 
iK  i2 


32f 


i2   iK 


? 
3  f 


8x 


iK 


12 


x 


iK 


=  -r-  x'.D.x.   (say)  , 

2  — i  l—i 


i  =  1 , . . .  ,  n  , 


where  D.  is  evaluated  at  x*  6  (0,x_. ).   Using  the  fact  that  for  any  sym- 
metric matrix  A 


z  Az   . 

sup  — ; =  A    , 

v   z'z     max 
z 


where  X    is  the  maximum  eigenvalue  of  A.  we  have 
max 


R.  <  —  xlx.A.  , 
l  —  2  —l—i  l 


i  =  1 , . .  .  ,  n  , 
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where  X.  is  the  maximum  eigenvalue  of  D.  .   That  is 
i  1 


R_<_y  diag(XX')X 


or 


IIRII  £y  Hdiag(XX' )  II -II  XII,  (5) 

where  X  =  (x-,aa.,x  )'  and   X  =  (A-,«aa,A  )'. 

—    — 1 '    '— n  1  *    '  n 

Combining  (4)  and  (5)  we  have 

II plim  8-31    C  \  lim  -7=  ll(^-^)_1Xf  II  .limlldiag(XX' )  II  .lim  -~  11X11    ,      (6) 
*  vn  n  /n 

where  the  terras  involving  X  represent  variability  of  the  independent 

variables  and  11X11  gives  the  degree  of  curvature.   This  result  is  similar 

to  White's  Theorem  1.   To  see  it  more  clearly,  assume  that  there  is  only 

one  independent  variable.   Then 

2 

|  plim  3—6  |  <-?■'  lim(—  E  x2)~/2  •  lim  -~  11X11  , 
^       ni=l  1  ^n 

where  x  is  the  largest  value  of  x.   So  the  degree  of  inconsistency 

depends  directly  on  the  range  of  x  and  the  amount  of  curvature  of  the 

underlying  function  and  inversely  on  the  variance  of  x.   The  difference 

between  White's  and  our  results  is  that  White's  bounds  additionally 

depend   on  the  skewness  of  x.   However,  our  derivation  is  simpler. 

If  we  have  some  knowledge  about  f(*)  the  bound  in  (6)  can  be  made 

tighter  with  appropriate  scaling  of  data.   After  scaling  let  the  new 

data  set  be 

Y  =  xs , 
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whera  S  =  diag( 1 ,1/d , . . . , 1/d)  with  d  >  1.   The  purpose  of  scaling  is  to 
bring  all  the  observations  on  independent  variables  within  (-1,1)  inter- 
val.  Then  (6)  can  be  written  as 

llplim   0-3 H    <  -!-  lira  -tL  II  (^-^-)~1X'  II  •  lirall diag(XX f )  II  •  ^- lim -4  llTll    , 
—  2  /n  n  d  /n 

where  A  =  (A,,.,., A  )'  is  the  same  as  A  but  now  the  matrix  D.  is  evalu- 
1      n  i 

ated  at  x*  e  (0,x.)  with  x.  =  (x . n  , . . . ,x .„) ' .   It  is  clear  that  x.  is 

1        — 1         —1       1Z        IK.  —1 

nearer  to  the  origin  compared  to  x. .   Now  if  the  function  has  less  cur- 

—l 

vature  around  the  origin,  then  II A II  <  HAH  and  the  bound  will  become  much 
tighter. 

III.   A  TEST  FOR  NONLINEARITY 

It  is  difficult  to  devise  a  test  that  is  applicable  to  all  forms  of 
functions.   Therefore,  we  will  restrict  ourselves  to  the  class  of  func- 
tions in  which  f(*)  is  monotone  in  x  and  there  is  a  finite  (possibly 
zero)  number  of  inflection  points  throughout  the  whole  range  of  x. 

Following  White  [4,  p.  155]  we  define  a  parameter  vector  0*  which 
minimizes 

0(0)  =  [f(x,a)-X0  +  u]'[f(x,a)-X3  +  uj. 


Now  if  there  are  p  inflection  points  we  partition  x  into  (p+2)  dis- 
joint sets,  i.e.,  find  X-.>»»«>X_,o  such  that 

1      P"*"^ 


Sometimes  we  have  some  idea  about  a  function  in  terms  of  its  in- 
flection points,  e.g.,  most  production  functions  do  not  have  any  inflec- 
tion point  whereas  growth  curves  like  logistic  and  Gompertz  have  one 
inflection  point. 
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p+2 

U  X.  =  X  and  x,   n  X.  =  *  , 
i=l  i*j   J 


i,j  =  l,...,p+2,   (7) 


where  <}>  is  a  null  set.   Then  define 

oca1*)  =  inf    {oce1)}, 

*=X- 


i  =  1 , . . . ,p+2. 


It  is  postulated  that  testing  the  null  hypothesis  H  :  f(*)  is 

1*  p+2* 

linear  is  equivalent  to  testing  H  :   3   =  ...  =  8     .   This  equiva- 
lence is  quite  straightforward.   However,  the  major  problem  is  to  find 
out  an  appropriate  partition  of  x  satisfying  (7).   In  practice,  it  will 
be  difficult  to  satisfy  the  second  part  of  (7).   Therefore,  instead  of 
trying  to  get  a  disjoint  partition  of  x  we  partition  the  index  set  I  = 
{l,... ,n}. 

For  simplicity  assume  there  is  no  inflection  point,  in  which  case 
we  partition  I  into  two  sets  only.   First,  we  order  all  the  observations 
in  ascending  order  of  y,  i.e.,  permute  I  in  such  a  way  that 


y(1)  <.y(2)  <.  —  i.y<nr 


Then   choose 


I..    =    {l,...,n    }    and   I„   =    {n  +l,...,n[    such    that 
— 1  o  — I  o 


z     (y.-xie1)2  +    s     (y.-x:e2)2 

•<=T  11  .r-T  11 

is  minimum  for  K  <  n  _<_  n-K.   This  means  v/e  are  fitting  two  "best" 
linear  regression  lines  based  on  n  and  (n-n  )  observations  respec- 
tively.  This  will  induce  a  partition  in  the  observation  space  x  and 
for  our  sample  observations  we  will  get  a  partition  of  the  matrix  X. 

Our  test  statistic  is  based  on  the  dif- 


We  denote  it  as  X  = 


X, 


*1  *2 
ference  3  -8  and  is  defined  as 


where 


and 
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~1    ~2    -1  «1    ~2 

<|i  =  (31  -  3  )'V  x(3  -  3  ), 


v  =  (xp^)  1v1(x^x1)~1  +  (x^x2)~1v2(x^x2)"1 


V„  =  Z        (v.-xie^Vx.x'.  ,         £  =  1,2. 
I        .    T    J  i   l     l  i  '  * 

*!* 

Using  Theorem  3  of  White  [4,  p.  156],  it  can  be  shown  that  under  H  , 

2 
\\>   follows  asymptotically  a  chi-square  (x  ).  distribution  with  K  degrees 

2 
of  freedom.   If  $  is  larger  than  the  tabulated  x  v   value  we  reject 

H  at  a  significance  level, 
o 

To  assess  the  performance  of  this  test  we  consider  CES  production 
function  and  its  two  linear  approximations — the  Cobb-Douglas  and  trans- 
log  functions, 

-a  -a 

(CES)  In   F.    =   -a./a_    ln(a_L.      +a.K.      )   +  u .     , 

l  12  3i  4   l  l 

(Cobb-Douglas)         In   F     -   0-   +   3„    In  L.    +   0,    In   K.    +   £.     , 

112  13  li 


and  (translog)    In  F.  =  3,  +  0„  In  L.  +  30  In  K.  +  3, (In  L.) 

112     i3     14     l 


2 


+  3r(ln  K.)2  +  3,  In  L. 'In  K.  +  e.,  i  =  l,...,n, 
5     l      6     l     l    l 

where  F. ,  L.  and  K.  are  output,  labor  and  capital  inputs  for  the  i-th 
firm.   The  data  was  generated  as  it  was  in  White  [4,  p.  151].   We  do 
not  make  much  attempt  to  compare  our  results  with  White's  reported  re- 
sults.  His  test  is  based  on  the  difference  between  the  OLS  and  weighted 
least  squares  estimates  and  it  depends  on  the  choice  of  weights.   Dif- 
ferent choices  of  weights,  as  presented  in  his  paper,  might  lead  to 
conflicting  decisions.   Our  results  are  reported  in  Table  1. 
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TABLE  1 


PARAMETER  ESTIMATES  FOR  TWO  DATA  SETS  AND  THE  TEST  STATISTICS 

(Sample  size  =  200) 


Cobb-Douglas 

-1           ~2 
3           3 

Trans 
81 

slog 

*2 

3" 

Constant 

-.14437      -.08495 

-.12537 

-.08207 

In  L 

.17689       .35433 

.18645 

.34766 

In  K 

.27423       .43049 

.29165 

.45836 

(In  L)2 

.01156 

.03284 

(In  K)2 

-.00841 

-.03780 

In  L.ln  K 

-.12351 

-.05670 

Dividing  point  (n  ) 

108 

108 

Test  Statistic  (i|>) 

210.89242 

207 

.39276 

For  both  the  approximations  the  sample  was  divided  almost  at  the 
raid  point  and  H  was  rejected  decisively.   A.  number  of  other  nonlinear 
functions,  together  with  their  linear  approximations,  were  tried  and 
this  test  procedure  was  found  to  have  very  high  power. 


IV.   SOME  REMARKS 
Two  points  should  be  mentioned  about  the  bound  for  inconsistency. 
First,  since  the  bound  is  in  terms  of  vector  norm  it  is  not  possible  to 
infer  about  individual  parameters.   Second,  the  bound  depends  on  unknown 

quantities.   It  would  be  an  interesting  problem  to  investigate  whether 
we  can  estimate  these  quantities  from  the  available  information. 
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While  testing  nonlinearity  for  samples  of  large  size,  finding  n 
will  require  a  huge  amount  of  computational  work.   However,  use  of  the 
recursive  relation  for  matrix  inversion  given  in  Brown  et  al.  [2,  p.  152] 
can  reduce  the  computational  problems  considerably. 
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